Introduction to HPC

Scientific Programming with Python

Manuel Holtgrewe

Berlin Institute of Health at Charité

Session Overview

Aims

  • Learn about important applications for Python in scientific programming …
  • … in the biomedical/life sciences.

Considered Applications

  • “Data science wrangling” with polars
  • Fast numerical computations with SciPy/Numpy
  • Interfacing with R for Statistics
  • Machine learning with scikit-learn
  • Deep Learning with Tensorflow

Juptyer Notebooks

  • Introduction
  • Installation and Usage

Introduction

  • What is it
  • Is it useful?

Installation

Using Juptyer Notebook

some points

screenshot

Using Jupyter Labs

some points

screenshot

Data Wrangling with Polars

  • Introduction
  • Loading and Writing Data
  • “Tidy Data” with Polars
  • Data Visualization

Introduction

  • Polars features
  • Polars vs pandas

Loading and Writing Data (1)

Loading Data

Loading and Writing Data (2)

Writing Data

Tidy Data with Polars (1)

What is tidy data?

R tidyverse

Tidy Data with Polars (2)

tidypolars

Data Visualization (1)

Plotly Express

Data Visualization (2)

Vega

Data Visualization (3)

Bokeh

Data Visualization (4)

It’s quite intimidating. How to get started.

Conversion Between Polars and Pandas (1)

  • why
  • install dependencies

Conversion Between Polars and Pandas (2)

  • example code & explanation

Conversion Between Polars and Pandas (3)

  • example code & explanation

Numerical Computation with SciPy/NumPy

  • Introduction
  • Input and Output
  • Arrays and Shapes
  • Vectorized Operations

Introduction to SciPy/NumPy

NumPy

  • Numerical computing data structures
    • high-performance vectors/matrices
  • Numerical computing algorithms
    • vectorized operations, random numbers …
    • linear algebra

SciPy

  • Fundamental scientific algorithms
    • clustering
    • interpolation / smoothing
    • statistics
    • sparse matrices
    • image and signal processing
    • FFT, integration

👉 Mostly low level characteristics, many other libraries build on top.

Input and Output

  • numph array files
  • compression
  • more than one array

Arrays and Shapes

  • vectors
  • matrices
  • data types

Vectorized Operations (1)

Example

Vectorized Operations (2)

Benchmark vs. Python List

Interfacing with R for Statistics

  • Introduction
  • Low-Level Approaches
  • Using rpy2

Introduction

  • why?
  • how?

Low-Level Approaches

  • via disk
  • call R scripts from Python

Using rpy2 (1)

data transfer vs pandas and data frames

Using rpy2 (2)

Some examples:

  • Student’s t-test
  • Kolmogorov-Smirnov test
  • Fisher test

Using rpy2 (3)

  • Data frame operations

Using rpy2 (4)

  • dplyr & tidyr

Using rpy2 (5)

  • ggplot2

Using rpy2 (6)

  • RMagics

Machine Learning scikit-learn

  • Introduction
  • Estimator Cheat Sheet
  • Example: Clustering
  • Example: Regression
  • Example: Classification

Introduction

Estimator Cheat Sheet

Example: Clustering

Example: Regression

Example: Classification

  • Introduction
  • Installation
  • “TensorFlow 2 Quickstart for Beginners”
  • Running on the HPC

Introduction

Installation

“TensorFlow 2 Quickstart for Beginners”

https://www.tensorflow.org/tutorials/quickstart/beginner

Running on the HPC

Deep Learning with Tensorflow

Bring Your Own Project

🫵 Where can you apply what you have learned in your PhD project?

This is not the end…

… but all for this session

Recap

  • Overview of Python in scientific programming
  • Data wrangling with Polars
  • Numerical computations with SciPy/NumPy
  • Interfacing with R for Statistics
  • Machine Learning with scikit-learn
  • Deep Learning with Tensorflow